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^ (54) Title: METHOD AND SYSTEM FOR MODELING A LEGACY COMPUTER SYSTEM 
O 

^\ (57) Abstract: A method and system for modifying program applications of a legacy computer system to directly output data in 

XML format models the legacy computer system, maps the model to an XML schema and automatically modifies one or more 
\^ applications to directly output XML formatted data in cooperation with a writer engine and a context table. A modeling engine lists 
-*■«. the incidents within the applications that write data and generates a report data model. The report data model includes statically 

determined value or type of the data fields and is written in a formal grammar that describes how the write operations arc combined. 

A modification specification is created to define modifications to the legacy computer system applications that relate applications 
Q that write data to the XML schema. A code generation engine then applies the modification specification to the applications to write 

modified applications that, in cooperation with a writer engine and context table, directly output XML formatted data from the legacy 

computer system without a need for transforming the data. 
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METHOD AND SYSTEM FOR MODELING A LEGACY COMPUTER SYSTEM 
TECHNICAL FT ELD 

This invent ion ...relates in general to the field of 
computer systems, and more particularly a method and 
system for reporting XML data from a computer system. 

5 

CROSS REFERENCE TO RELATED APPI.TCATTONR 

This application relates to U. S. Patent Application 

Serial Number , entitled METHOD AND SYSTEM FOR 

APPLYING XML SCHEMA, by Ballantyne, et al . , filed 
10 concurrently with this application. 

This application relates to U. S. Patent Application 

Serial Number , entitled METHOD AND SYSTEM FOR' 

REPORTING XML DATA FROM A LEGACY COMPUTER SYSTEM, by 
Ballantyne, et.al., filed concurrently with this 
15 application. 

BACKGROUND OF THE INVENTION 

The Internet- and e-commerce are rapidly reshaping 
the way that the world does business. In addition to 
20 direct purchases made through the Internet, consumers 
increasingly depend upon information available through 
the Internet to make purchasing decisions. Businesses 
have responded by allowing greater access of information 
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through the Internet both directly to consumers and to 
other businesses such as suppliers. One result of the 
increased access to electronic information through the 
Internet is a decreased dependency and desire for printed 
5 "hard copy" information. 

Extensible Mark-up Language ("XML") provides ah 
excellent tool for business-to-business electronic 
commerce and publication of data via the Internet. XML 
specifies a format that is easily adapted for data 

10 transmission over the Internet, direct transfer as an 
object between different applications, or the direct 
display and manipulation of data via browser technology. 
Currently, complex transformations are performed on data 
output in legacy computer system formats in order to put 

15 the data in XML format. 

One example of the transformation from written 
reports typically output by legacy computer systems to 
electronic reports is the telephone bill. Historically, 
telephone companies have relied on mainframe or legacy 

20 computer systems running COBOL code to track and report 
telephone call billing information. Typically, these 
legacy computer system reports are printed, copied and 
distributed to those who need the information. However, 
conventional legacy computer system report formats are 

25 difficult to transmit or manipulate electronically. Yet, 
the electronic distribution of bills-, such as through e- 
mail, a biller's web site or at a bill consolidator 
chosen by the consumer, enhances flexibility and control 
of bill payment, especially with complex business 

30 invoices. 
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Generally, in order to make conventional legacy- 
reports available in different formats, a complex 
transformation of the data is performed based on a report 
print stream. One transformation technique is to write a 
5 "wrapper" around the legacy computer system. The wrapper 
includes parsers and generators that transform legacy 
computer system reports into XML formatted output. 
Parsers apply a variety of rules to identify and tag data 
output in a legacy report. For example, a parser might 

10 determine that a data field of a telephone bill 

represents a dollar amount based on the presence of a 
dollar sign or the location of a decimal point in the 
data field, or that a data field represents a customer 
name due to absence of numbers. Once the parser 

15 deciphers the legacy report, a generator transforms the 
legacy computer system data into appropriately tagged XML 
format . 

Although the end result of the parsing and 
transforming process is data in an XML format, the 

20 process itself is difficult and expensive to implement 
and cumbersome to maintain. Without careful study of 
underlying program logic, it is generally not possible to 
reliably determine all potential outputs from the legacy 
computer system. In particular, even a fairly large 

25 output sample is almost certain to be incomplete in that 
some program logic is only rarely exercised. Another 
difficulty with the parsing and transforming process is 
that, as changes are made to the underlying program 
applications of the legacy computer system, the parsing 

30 and transforming systems generally require updates that 
mirror the underlying changes. These downstream changes 
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increase the time and expense associated with maintaining 
the legacy computer system, and also increase the 
likelihood of errors being introduced into the XML 
formatted output. 
5 Another difficulty associated with the use of XML is 

that, although XML dramatically improves the utility of 
output data, the generation of XML output depends upon 
underlying programs that adhere to an exacting data 
structure. For instance, the generation of syntactically 

10 correct XML requires adherence to: a rigid labeled tree 

structure so that output data is identified by "tags" and 
"end tags" associated with the XML data structure as 
defined by an XML schema. When writing a deeply embedded 
element of an XML tree, such as a subschema within a 

15 defined XML schema, tags corresponding to all of that 

element's ancestor elements must also be written. When, 
writing another element, not part of a current XML 
subschema, the current subschema must be closed off to an 
appropriate level with balancing closing end tags for the 

20 ancestor elements. XML schema also specify type and 
cardinality constraints on their elements. Thus, 
substantial and exacting bookkeeping of programs that 
output XML is necessary with respect to the XML schema in 
order to minimize errors on the part of programmers. 

25 

SUMMARY OF THE INVENTION 

Therefore, a need has arisen for a method and system 
which rapidly and automatically modifies legacy computer 
systems to produce output in an XML format . 
3 0 A further need exists for a method and system which 

modifies legacy computer systems to produce output in XML 
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format without altering the underlying legacy computer 
system program logic or business rules. 

A further need exists for a method and system which 
determines write operations of a legacy computer system 
5 to allow modification of those nodes so that the legacy 
computer system outputs data in XML format . 

A further need exists for a method and system which 
generates syntactically correct XML output with automated 
bookkeeping to minimize programming errors. 

10 In accordance with the present invention, a method 

and system is provided that substantially eliminates or 
reduces disadvantages and problems associated with 
previously developed methods and systems that transform 
the output from legacy computer systems into an XML 

15 format. The present invention provides XML output by 
modifying the underlying legacy computer system program 
applications to report data in XML format instead of 
transforming the output from the legacy computer system 
after the data is reported in the format of the legacy 

20 computer system. 

More specifically, a code generation engine 
automatically modifies legacy computer system program 
applications to create modified legacy program 
applications. The modified legacy program applications 

25 are run on the legacy computer system so that the data 
output from the .legacy computer system is in XML format. 
The modified legacy program applications are written in 
the computer language of the legacy computer system so 
that the legacy computer system directly produces an XML 

30 version of its output without the need to alter the logic 
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or business rules embodied in the unmodified program 
applications of the legacy computer system. 

The code generation engine creates the modified 
program applications in accordance with a modification 
5 specification created by a mapping engine. The mapping 
engine generates the modification specification and 
context table by mapping a model of write operations of 
the legacy computer system to an XML schema. The mapping 
engine provides the modification specification to the 

10 code generation engine. The code generation engine 
creates modified legacy computer system program 
applications for use on the legacy computer system. A 
writer engine is an application program loaded on the 
legacy computer system and written in the language of the 

15 legacy computer system. The writer engine is called by 
the modified program applications to write XML output in 
the format of the XML schema encoded by the context 
table. 

The model used by the mapping engine is generated by 
20 a modeling engine which analyzes the legacy computer 

system to identify and model the write operations, such 
as with a report data model. The modeling engine 
determines a list of legacy computer system program 
applications that report data. The program applications 
25 that report data are further analyzed to determine the 
incidents within each program application at which a 
write operation exists. A report data model is then 
compiled with a value and/or type for the data fields of 
each' incident. The report data model is augmented by a 
30 formal grammar that simplifies the process of relating 
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write operations to execution paths of legacy computer 
system program applications. 

Once the modified program application is loaded on 
the legacy computer system, the legacy computer system 
5 continues to perform its functional operations without 

change to the underlying business or program logic. When 
a legacy computer system program application commands the 
reporting of data, modified instructions provided in the 
modified program application call the writer engine to 

10 output syntactically correct XML data. The writer engine 
determines the current context of XML output and opens 
appropriate schema element data structures in conjunction 
with the context table. The writer engine then analyzes 
the current schema element data structure .and the called 

15 schema element to determine the relationship of the 

called schema element with the current schema element. 
If the called schema element is a descendant of the 
current schema element, the writer engine opens the 
schema element ID tags down through the called schema 

20 element and outputs the data from the schema element in ' 
syntactically correct XML format. If the schema element 
is not a descendant of the current schema element, the 
writer engine finds a mutual ancestor having consistent 
cardinality, closes the schema element. ID tags up to the 

25 ancestor schema element and proceeds to open the schema 

element ID tags down through the called schema element to 
output data in syntactically correct XML format. In 
addition, the writer engine supports delayed printing of 
tags and attributes until such time as a complete 

30 syntactic unit is available. 
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The present invention provides a number of important 
technical advantages. One important technical advantage 
is the ability to rapidly and automatically modify legacy 
computer system program applications to enable them to 
5 directly produce an XML version of their data output . By 
modifying the underlying legacy computer system program 
applications, XML output is made available directly from 
the legacy computer system without a transformation of 
the data itself from a legacy computer system format. 

10 Further, the underlying program logic and business rules 
remain unaltered so that the substantive functions of the 
legacy computer system need not change. Thus, a business 
enterprise using a legacy computer system is provided 
with the greater accessibility to data provided by output 

15 in XML format without affecting computed values. 

Another important technical advantage of the present 
invention is that modification of the underlying legacy 
computer program applications is operationally less 
expensive, complex and time-consuming than transformation 

20 of legacy computer system output to an XML format. For 
instance, once modified program applications are running 
on the legacy computer system, XML formatted output is 
available without further action to the data. By 
comparison, transformation of output to an XML format 

25 after the data is reported by the legacy computer system 
requires action with each data report. Thus, if any 
changes are made" to the underlying legacy program 
applications, changes must also generally be made to 
transformation applications that mirror the underlying 

30 changes. This further complicates the maintenance of the 
legacy computer system. 
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Another important technical advantage of the present 
invention is that, whether or not used with a . legacy- 
computer system, the writer engine and context table aid 
in the generation of syntactically correct XML output. 
5 For instance, the writer engine ensures that a command to 
write an embedded XML element will include tags 
■ corresponding to all of the embedded element 1 s ancestor 
elements. Also, when an XML element is written that is 
not part of the current XML subschema, the writer engine 

10 will close off the current XML subschema to an 
appropriate level of an ancestor schema element. 
Automation of the bookkeeping involved with the XML 
schema eliminates the risk of syntactic errors associated 
with XML reports. The delayed printing feature provides 

15. a mechanism whereby a program can generate correct XML 
data even when the sequence of print commands in the 
original legacy system application program does not map 
directly onto the order of XML elements prescribed by the 
XML schema. 

2 0 Another important advantage of the present invention 

is that tool support manages the complexity of modeling 
the underlying program logic, resulting in substantially 
reduced time and expense for modification of a legacy 
computer system to output XML formatted data. Tools aid 
25 in: the determination of the control flow graph of legacy 
applications; the abstraction out of this graph of a 
subgraph specifically related to the writing of report 
lines; the identification of constants and data items 
that flow into print lines so that the elements that need 

3 0 to be written as tagged XML can be. readily identified; 

and the identification of domain specific information 
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such as locations of headers and footers. Automation 
through tool support greatly enhances management of 
program complexity. 

5 BRIEF DESCR IPTION OF THE DRAWINGS 

• A more complete understanding of the present 
invention and advantages thereof may be acquired by 
referring to the following description taken in 
conjunction with the accompanying drawings, in which like 
10 reference numbers indicate like features, and wherein: 
Figure 1 depicts a block diagram of a code 
generation system in communication with a legacy computer 
system; 

Figure 2 depicts a flow diagram of the generation of 
15 modified legacy program applications to output XML data; 

Figure 3 depicts a flow diagram of the generation of 
a model of the write operations of a legacy program 
application; 

Figure 4 depicts a sample output of a legacy 
20 computer system report for a telephone bill; 

Figure 5 depicts XML formatted data corresponding to 
the legacy computer system report depicted by Figure 4; 

Figure 5A depicts an XML schema for the output 
depicted in Figure 5; 
25 Figure 6 depicts a graphical user interface for 

mapping legacy computer system code to an Extensible 
Markup Language schema and report data model ; 

Figure 6A depicts underlying COBOL code modeled by 
the report data model of Figure 6; 
30 Figure 7 depicts a sample Extensible Markup Language 

schema for outputting address data; 
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Figure 7A depicts a tree structure for the schema of 
Figure 7; 

Figure 7B depicts a computed data context table for 
the schema depicted by Figure 7; and 
5 Figure 8 depicts a flow diagram of an XML print 

operation that ensures generation of syntactically- 
correct Extensible Markup Language data output . 

DETAILED DESCRIPTION OF THE INVENTION 
10 Preferred embodiments of the present invention are 

illustrated in the figures, like numeral being used to 
refer to like and corresponding parts of the various 
drawings . 

In order to take advantage of the opportunities 

15 provided by the use of XML as a medium for e-commerce, 
businesses will eventually have to either replace 
existing legacy computer systems or re-write the 
applications on the legacy computer systems. However, 
businesses have substantial investments in their existing 

20 legacy computer systems and related applications so that 
wholesale replacement of these systems and applications 
is not practical in the short term. Legacy computer 
systems perform essential functions such as billing, 
inventory control, and scheduling that need massive on- 

25 line and batch transaction processing. Legacy computer 
system applications written in languages such as COBOL 
remain a vital part of the enterprise applications of 
many large organizations for the foreseeable future. In 
fact, this installed base of existing software represents 

30 the principal embodiment of many organizations' business 
rules. Although, in principle, these applications could 
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be hand-modified to output data in. XML format, in reality 
the underlying logic of even a simple report application 
can be difficult to understand and decipher. 

Therefore, a tremendous challenge facing many . 
5 businesses is the rapid and inexpensive adaptation of 
existing computer systems to take advantage of the 
opportunities presented by electronic commerce. Even 
when installing new and updated computer systems, the 
ever-evolving nature of electronic commerce demands that 

10 businesses incorporate flexibility as a key component for 
new computer systems. XML has become a popular choice 
for reporting data due to the ease with which XML adapts 
to essential e-commerce functions, such as transmission 
over the Internet, direct transfer as an object between 

15 different applications and display and manipulation via 
browser technology. XML's flexibility results from its 
inclusion of named tags bracketing data that identify 
the data's relationship within an XML schema. However, 
implementation of XML data reports relies on accurate use 

20 of tags to define the output data within the XML schema. 
Thus, computer systems that implement XML adhere to the 
XML schema and use exact bookkeeping to obtain accurate 
reports. 

The present invention aids in the implementation of 
25 XML for reports, both by the modification of legacy 

computer system program applications to output XML data 
and by the tracking of XML output within an XML schema to 
ensure an accurate output, whether or not the XML data 
originates with a legacy computer system. Referring now 
30 to Figure 1, a block diagram depicts a computer system 10 
that modifies a legacy computer system 12 to output data 
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in XML format . A code generation system 14 interfaces 
with legacy computer system 12 to allow the analysis of 
one or more legacy program applications 16 and the 
generation of one or more modified legacy program 
5 applications 18. Code generation system 14 also provides 
a writer engine 20 and context table 22 to legacy 
computer system 12 . Legacy computer system 12 is then 
able to directly output XML formatted data when modified 
legacy program applications 18 call writer engine 20 in 

10 cooperation with context table 22 to output syntactically 
correct XML data. 

Code generation system 14 includes a code generation 
engine 24, a mapping engine 26 and a modeling engine 28. 
Modeling engine 28 interfaces with legacy computer system 

15 12 to obtain a copy of legacy program applications 16 for 
automated review and modeling. Modeling engine 28 
generates a list of incidents ' for points in the program 
at which data is written. For instance, modeling engine 
28 may search the source code of the legacy program 

20 applications for reporting or writing commands for 

selected output streams. The list of report incidents 
are used to model the report functions of the legacy 
computer system such as by a report data model that lists 
the values and types of written data fields from the 

25 legacy program applications 16. The list of report 

incidents is then augmented by a formal grammar that is 
used to relate the XML schema to the output reported by 
the legacy program applications. The list of report 
incidents and the formal grammar are two components of 

30 the report data model for the legacy system application 
program. Intuitively, an incident describes a line in a 
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report, and the formal grammar describes how the 
application program sequences those lines to form a 
report . 

Modeling engine 28 provides the report data model 
5 identifying report incidents in the legacy program 

applications 16 to mapping engine 26 and modeling/mapping 
graphical user interface 30. Mapping engine 26 maps the 
report incidents from the report data model to the XML 
schema 32 and this relationship between the report data 

10 model and XML schema 32 is displayed on model ing/mapping 
graphical user interface 30. By establishing the 
relationship between the report incidents of legacy 
program application 16 and the XML schema 32, mapping 
engine 2 6 defines a specification for modification of the 

15 legacy program applications 16 to output XML data. 

Modeling/mapping graphical user interface 30 provides 
information to programmers of the modification 
specification. Model ing/mapping graphical user interface 
30 produces a modification specification and a context 

20 table 22. Optionally, the modeling/mapping graphical 

user interface 30 allows programmers to create or modify 
an XML schema. 

Code generation engine 24 accepts the modification 
specification, a copy of the legacy program applications 

25 16, and context table 22 to generate modified legacy 
program applications 18. Based on the modification 
specification, code generation engine 24 generates source 
code in the computer language of the legacy computer 
system that is inserted in legacy program applications 16 

30 to command output of XML data and saves the modified 

source code as modified legacy program applications 18. 
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The modified legacy program applications 18 may continue 
to maintain the legacy computer system report 
instructions so that the modified program applications 18 
continue to report data in the legacy computer system 
5 format in addition to the XML format. The outputting of 
both formats aids in quality control by allowing a direct 
comparison of data from modified and unmodified code. 
Alternatively, the modified instructions provided by code 
generation engine 24 may replace report instructions of 

10 legacy program applications 16 so that modified legacy 
program applications 18 report data exclusively in XML 
format . Writer engine 20 is written in a computer 
language of legacy computer system 12 and references 
context table 22 to determine the appropriate XML schema 

15 elements for output of data from legacy system 12. The 
modified code in modified legacy program applications 18 
calls writer engine 20 when outputting data in XML 
format . 

Referring now to Figure 2, a simplified flow diagram 
20 depicts the process of generation of modified legacy 

program applications that output data in XML format. The 
process begins at step 34, in which the legacy code of the 
legacy program applications 16 is made available to code 
generation system 14. For example, a mainframe legacy 
2 5 computer system running COBOL source code downloads a 

copy of the source code to code generation system 14 for 
analysis and generation of modified code. 

At step 36, code generation system 14 models the 
legacy program applications to provide a report data 
30 model of the write incidents and their underlying grammar 
from the legacy program applications' code. For 
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instance, .a report ■ data model identifies the incidents 
within the code of legacy program applications 16 at 
which data to selected output devices are written, 
including the values and types of the data. At step 38, 
5 the report data model is used to generate a modification 
specification. The modification specification is 
generated in conjunction with an XML schema provided at 
step 40 that defines the data structure for write 
instructions of the modified legacy program applications 

10 18 to output XML data. 

■At step 42, the modification specification is used 
to automatically generate modified legacy code to be run 
on the legacy computer system 12 . The modified legacy 
code is run at step 44 so that the modified legacy 

15 program applications emit output from legacy system 12 in 
XML format without requiring further transformation of 
the output data. 

The process of modeling legacy computer system 12 is 
shown in greater detail by reference to Figure 3. 

20 Modeling engine 28 extracts a report data model of legacy 
program applications 16 through an automated analysis of 
the underlying legacy code. The automated analysis 
provides improved understanding of the operation of the 
legacy code and reduces the likelihood of errors 

25 regarding the operation and maintenance of the underlying 
legacy code. Essentially, modeling engine 28 parses the 
legacy software process into rules to graph its control 
flow. An abstraction of the control flow produces a 
report data model that allows understanding of data types 

30 and invariant data values written at each write 

instruction in the report data model . The report data 
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model, when combined with the values and typing of 
written data fields, provides a model of legacy program 
applications 16. 

Referring to Figure 3, the modeling process starts 
5 at step 46 through a determination of the legacy 

programs' control flow graph. The control flow graph of 
a particular legacy program application is a directed 
graph (N, A) in which N contains a node for each 
execution point of the program application and A contains 

10 an arc <ni, n 2 >, where n x and n 2 are elements of N, if the 
legacy program application is able to move immediately 
from n x to n 2 for some possible execution state. 

At step 48, the write operations of the control flow 
graph are determined to obtain a data file control graph. 

15 Essentially, the control flow graph is abstracted to 

contain only start nodes, stop nodes, and nodes writing 
to selected data files. This results in a data file 
control graph that identifies the write incidents in the 
legacy program applications. The data file control graph 

20 abstracted from a control flow graph (N, A) is a directed 
graph (N R , Ar) . A node n is in the set of nodes N R if the 
node n starts a legacy program application, stops a 
legacy program application or writes to a data file. The 
arc <ni, n m > is in A R if both n x and n m are in the set of 

25 nodes N R and a sequence of arcs <ni, n 2 >, <n 2 , n 3 >, . • . 
<n m _i, n m > exists in A where, for i from 2 to m-1, ni is 
not in the set of nodes N R . 

Once- the data file control graph is completed, at 
step 50, information about the data written at each data 

30 ' file write node is attached to the data file control 

graph. For instance, the values or the type of each data 
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field written by each node are statically determined via 
data f l~ow in the control flow graph and are attached to 
the nodes of the data file control graph. 

• At step 52, the paths from the start nodes through 
5 the data file control graph to the stop nodes are 

represented in a formal grammar. This formal grammar 
with the attached data field information form the report 
data model . This model is an abstract representation of 
the data files that can be written by the legacy program 

10 applications and provides the basis on which a 
modification specification can be written. 

The report data model is presented in two parts. 
First, each write node with its attached data field 
information is presented as an incident. These incidents 

15 are the most basic or leaf sub-expressions of the report 
data model. Second, the non-leaf sub- expressions of the 
report data model are presented as rules hierarchically 
building up from the incidents. 

The generation and presentation of a report data 

20 model of legacy program applications may be illustrated 
by consideration of a telephone bill example. ■ Figure 4 
depicts the printed output from a COBOL program for a 
telephone bill. A typical COBOL program prints the 
telephone bill in a predetermined format that may 

25 include, for example, predetermined paper sizes and 
column dimensions. The printing of the "TOTAL CALLS" 
line in Figure 4 is the result of a computation of the 
total number of calls, total time of the calls and the 
total cost of the calls. As an example of a single node 

30 of a control flow graph, the incident derived from COBOL 
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19 



code for outputting the total calls line of Figure 4 is 
as follows: 

Incident 47 loc 414 record PRTEC from RS-LINE 
<LINE.2> 
■ TOTAL CALLS:" 

RECORDS -SELECTED-EDIT loc 266 pic Z,ZZ9 size 5 
19: " TOTAL TIME: 

53* RS-HH loc 270 pic 99 size 2 



71 



RS-MM loc 272 pic 99 size 2 
RS-SS loc 274 pic 99 size 2 
RS-COST loc 276 pic $$$$$.99 size 8 



Incident 47 describes the data written at the 
appropriate point in the program by the write instruction 
at line 414. The data include the headings of "TOTAL 
CALLS" and "TOTAL TIME" followed by the accumulated 
values for the total number of calls, the total time of 
calls and the total cost of calls. The constant values 
"TOTAL CALLS" and "TOTAL TIME" are determined by data 
flow analysis of the legacy application program. 

The report data model includes grammar rules built 
up from the write incidents. Once each grammar rule is 
defined from the appropriate incidents and sub-rules, a 
report grammar describing the potential output of the 
legacy program applications for the bill shown in Figure 
4 is generated as follows: 
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20 



10 



5 



Rule 23 
Rule 24 
Rule 41 
Rule 42 
Rule 45 
Rule 46 
Rule 50 
Rule 51 
Rule 61 
Rule 62 
Rule 63 
Rule 64 
Rule' 78 
Root 79 



[seq 3 4 5 6 7 8 9 10] 
[? 23] 

[seq 23 24 25] 
[?41] 

[seq 0 1 2 .42] 

[? 45] 

[seq 24 49] 

[?50] 

[seq 24 47 48 51 23] 
[? 61] 

[seq 62 24 25] 
[*63] 

[seq 46 64 24 47 48 50 65 66] 
[seq 78] 



15 



These grammar rules show how the write incidents are 
combined to represent the output written by the legacy- 
application program. For example, rule 61 consists of 
the sequence of sub-rules and incidents 24, 47, 48, 51, 

20 and 23. Data described by each sub-rule or incident is 
followed sequentially in the data file by the data 
described by the next sub-rule or incident. That is, in 
rule 61, data described by incident 47 is followed 
immediately by data described by incident 48. Rule 62 is 

25 a conditional rule indicating that data described by 61 
may be written to the data file or skipped entirely. 
Rule 64 is a repeating rule indicating that there is data 
described by rule 63 that is repeated zero or more times. 



30 to the XML schema of Figure 5A is depicted that provides 
a data structure for the legacy computer output of Figure 
4. The data falls within an opening tag of "<bill>" and 
a closing tag of "</bill>". The "bill" schema includes a 
"detail -list" subschema that, in turn, includes a 

35 "detail -by-phone" subschema. Within the "detail-by- 
phone" subschema separate tags are defined that report 



Referring now to Figure 5, data formatted according 
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the data from the TOTAL CALLS line of Figure 4 . The 
"total-bill-by-phone" subschema, the "total-time-by- 
phone" subschema and the "total-calls" subschema define 
the data printed in the TOTAL CALLS line of the legacy 

5 computer system output . 

Figure 5A depicts the XML bill schema used to output 
the data in Figure 5 . The root element of . the schema is 
the element type named "bill". Its subschemas are types 
of the subelements. The detail -by-phone subschema of the 
10 detail-list subschema of bill includes the data structure 
reported in the TOTAL CALLS line of Figure 4 . 

Referring now to Figure 6, one example of a display 
by the modeling/mapping graphical user interface 30 
illustrates the mapping relationship between the XML 
15 schema, the report data model and the underlying legacy 
computer program application depicted as COBOL code in 
Figure 6a. A grammar window 54 lists the report data 
model grammar rules provided by the report data model of 
the legacy program applications. An XML schema window 56 

0 depicts the XML schema depicted by Figure 5 that is 
representative of the legacy computer system output 
depicted by Figure 4. A mapping window 58 depicts the 
relationship between the variables of the legacy program 
applications and the XML tags of the XML schema. For 

5 instance, RS-TIME is a COBOL variable that is mapped to 
the "total-time" tag of the XML schema. Rule 79 
represents the root or beginning of the grammar provided 
by the report data model shown above. Within the grammar 
window, incident 47 falls under rule 78 as an .incident 

0 called to report the total cost from the legacy program 
application. 
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• Once a relationship is established between the 

report data model and the XML schema, a modification 

specification is written, and the generation of modified 

legacy program applications is automatically performed. 

The modified legacy program applications are designed to 

report the data from the legacy computer system along 

with XML schema tags that describe the nature of the 

data. For instance, the following is incident 47 having 

XML tag information and data field type and value 

information annotated within it: 

Incident 47 loc 414 record PRTEC from RS-LINE 
<LINE 2> 
0: " TOTAL CALLS:" size 14 
14: RECORDS - SELECTED - EDI T loc 266 pic Z,ZZ9 size 5 
tag total -calls-by-phone 

id bill\detail-list\detail-by-phone\ total- 
calls -by-phone 
type TAG when P 

19: "TOTAL TIME:" size 34 

53: RS-TIME loc 270 pic 99 size 2 
tag total -time-by-phone 
id bill\total-time 
type TAG when- P 



71: 



RS-MM loc 272 pic 99 size 2 
" : " size 1 

RS-SS loc 274 pic 99 size 2 
"" size 2- 

RS-COST loc 276 pic $$$$$.99 size 8 

tag total-cost 

id bill\total-cost 

type TAG when P 
"" size 2 



The annotated incidents provide the basis for the 
modification specification which is provided by mapping 
engine 26 to code generation engine 24 for the creation 
of modified legacy program applications. For instance, 
the modification specification for incident 47 is: 
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node (414, XML-TOTAL-CALLS- ID, 1 total-calls-by- 
phone ■ , ' RECORDS-SELECTED-EDIT ' , 266). 
node (414, XML -TOTAL-TIME -ID, ' total -time -by -phone ' , 
' RS-TIME 1 , 270) . 
5 node(414, XML-TOTAL-BILL-ID, ' total -bill -by-phone 1 , 

'RS-COST', 276) 

Note that the data items RS-HH, RS-MM, and RS-SS have 
been combined under data item RS-TIME. 

10 Code generation engine 24 applies the modification 

specification to determine the modifications needed for 
the legacy code to output appropriate tags relating data 
to the XML schema. For instance, the following code is 
added by code generation engine 24 in accordance to the 

15 modification specification in order to emit XML formatted 

data from the modified legacy program applications that 

relate to incident 47 : 

MOVE RECORDS - SELECTED - EDI T TO XML -BUFFER 
MOVE XML-TOTAL-CALLS -ID TO XML-UID 

20 CALL 'XML' USING XML-UID 

XML -BUFFER 

MOVE RS-TIME TO XML-BUFFER 

MOVE XML-TOTAL-TIME- ID TO XML-UID 

CALL 'XML' USING XML-UID 
25 XML -BUFFER 

MOVE RS-COST TO XML-BUFFER 

MOVE XML-TOTAL-BILL-ID TO XML-UID 

CALL 1 XML 1 USING XML-UID 

XML-BUFFER 

30 

The modified legacy program application calls writer 
engine 20 to emit output with tags provided from the XML 
schema stored in context table 22. Once modified legacy 
program applications 18 are loaded onto legacy computer 
35 system 12, writer engine 20 in cooperation with context 
table 22 is called by modified legacy program 
applications 18 to output an XML data stream. 
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The pre-computed data. necessary to control the 
accurate writing of embedded XML elements is generated 
from the XML schema. The pre-computed data consists of a 
map from an index to depth, start-label, stop- label, 
5 parent- index, and other information necessary to generate 
correct XML. For instance, the XML schema depicted by- 
Figure 7 provides a data structure for printing a 
customer's name, address and identification. Figure 7A 
depicts the tree structure of the XML schema shown by 

10 Figure 7. Figure 7B depicts the computed data structure 
of the XML schema shown by Figure 7, including the depth 
of each element corresponding to the element's position 
in the tree structure and an index for each element 
indicating its ancestor element. For instance, the 

15 "Customer" element is the root of the XML schema and has 
a descendant element of "Address". The "Street" element 
is a descendant of the "Address" element, as indicated by 
the number 3 corresponding to the identification of the 
"Address" element. 

20 Referring now to Figure 8, a flow diagram depicts 

the process implemented in the write engine to output an 
XML data stream. The computed data depicted by Figure 7B 
is applied to the writing of the XML data stream with 
reference to the XML schema depicted by Figure 7. The 

25 process begins at step 100 where an XML print command is 

called along with identification of the schema element 

and the value to be printed. For instance, the commands: 

MOVE '861 East Meadow' TO XML -BUFFER 

MOVE XML -CUSTOMER- STREET TO XML-UID 

30 CALL 'XML' USING XML-UID 
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provide the identification .for the "Street" element of 
the computed data structure. 

At step 102, a test is made to see if the XML 
printing process has been initiated to emit data.' If 
5 not, the appropriate data structure or current context is 
initialized and the identified data file is opened at 
step 104. For example, an XML print instruction relating 
to customer data would result in initialization of the 
current context that has "Customer" as the root element. 

10 At step 106, a test is performed to determine whether all 
data of the data structure has been emitted. If all data 
is emitted, the process proceeds to step 108 where the 
appropriate XML end tags are emitted and the data file is 
closed. If, however, the node ID is not at the end of the 

15 data structure, then the process proceeds to step 109. . 
For instance, if the node ID is "City" then the process 
proceeds to step 109. 

At step 109, a test is performed to determine 
whether the called node ID is a descendant of the current 

2 0 node. For instance, the "Street" element is a descendant 
of the "Address" element. Thus, if the "Address" element 
is the current element and the "Street" element is the 
called element, then the process proceeds to step 110. 
In contrast, if the current element is the "Name" element 

25 and the called element is the "Street" element, then the 
process proceeds to step 112 in order to locate the 
nearest mutual ancestor node ID having consistent 
cardinality with the called element. Thus, the mutual 
ancestor of the "Name" and "Street" elements, the 

30 "Customer" element, would be identified. At step 114 the 
end tags are closed up to the "Customer" element, and the 
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process proceeds to step 110. The cardinality check at 
step 112 ensures that, if an ancestor only permits a 
single occurrence of a descendant, then the descendant is 
only printed once. For example, if a descendant element 
5 is emitted in successive occurrences, the cardinality 
indicates that, between each emission of the descendant, 
the ancestor element is closed and a new instance of the 
ancestor is opened. 

At step 110, tags are opened from the identified 

10 ancestor down through the called node, and attributes of 
the nodes along the tree structure are emitted along with 
appropriate values. At step 116 the process returns to 
step 100 to accept the next value in the XML data stream. 
An additional function of writer engine 20 is the 

15 delayed processing for writing of data as complete data 
structures. For instance, writer engine 20 stores 
attributes, values and text values to a data structure 
without emitting the data until the all of the 
attributes, values and text values of the data structure 

20 are complete. This delayed processing allows the writer 
engine 20 to adhere to the sequencing requirements of the 
XML schema. 

The sample output below illustrates the need for 
this capability. 

25 

SAMPLE OUTPUT 

Send check payable to 

John Doe ABC WIRELESS 

111 Mizar PI P. O. BOX 666666 

30 Pasadena CA 93436-1204 DALLAS TX 75263-1111 

Two addresses are printed side by side on the page. 
One is the customer address and the other is the remitter 
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address. Thus, a single line of output contains 
interleaved elements from two distinct subschemas, 
according to the target XML schema shown below. 

5 TARGET XML SCHEMA 

<ElementType name="name"/> 
<ElementType name= "address" /> 
< E 1 ement Type name = " phone - number "/> 
10 <ElementType name="city-state-zip"/> 
<ElementType name=" customer" > 

<element type = "name" /> 

<element type="address"/> 

<element type="city-state-zip"/> 
15 </ElementType> 

<ElementType name=" remitter" > 

<element type="name" /> 

<element type= "address "/> 

<elemeht type="city-state-zip"/> 
20 </ElementType> 

<ElementType name=" bill -header "> 

<element type= "customer" /> 

<element type="remitter"/> 
</ElementType> 

25 

A complete customer address subschema must be 
emitted before the remitter address subschema. Due to 
the structure of the legacy code (shown below) it is 
necessary to buffer up the remitter address components 
30 while writing the XML structure for the customer. - In 
addition to its other bookkeeping roles, the context 
table provides storage for this buff ering . operation. 
The original legacy code can be seen below: 
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FRAGMENT O F LEQACY COBOL DATA DECLARATIONS 



05 HL-BILL-HEADER-10. 

10 FILLER PIC X(49) VALUE 
10 FILLER PIC X(32) VALUE 

05 HL-BILL-HEADER-11. 
10 FILLER 

10 HLS -CUSTOMER-NAME 
10 HLS -REMITTANCE -NAME 
05 HL- BILL -HEADER- 12 . 
10 FILLER 

10 HLS -CUSTOMER -ADDRESS 
10 HLS -REMITTANCE- ADDRESS 
' 05 HL -BILL- HEADER- 13 . 
10 FILLER 
10 HLS-CT-ST-ZIP 
10 HLS-REMITTANCE-CT-ST-ZIP 



SPACES. 

"Send check payable to". 

PIC X VALUE SPACES. 

PIC X(40) VALUE SPACES. 
PIC X(40) VALUE SPACES. 

PIC X VALUE SPACES. 
PIC X(40) VALUE SPACES. 
PIC X(40) VALUE SPACES. 

PIC X VALUE SPACES. 
PIC X(40) VALUE SPACES. 
PIC X(40) VALUE SPACES. 



FRAGMENT OF LEG ACY CQBOL PROCEDURAL CQDE 

WRITE BILL-RECORD FROM HL-BILL-HEADER-10 AFTER 2 

20 WRITE BILL-RECORD FROM HL-BILL-HEADER-11 

WRITE BILL-RECORD FROM HL-BILL-HEADER- 12 

. WRITE BILL-RECORD FROM HL-BILL-HEADER- 13 



The modified code is shown below, with comments 
25 describing the successive operations. 
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MODIFIED LEGACY COBOL PROCHraTRAL CODE 

* Unchanged, since it does not emit anything 

* relevant to the schema 

WRITE BILL -RECORD FROM HL-BILL-HEADER- 10 AFTER 2 
5 * Emit the customer name 

MOVE HLS -CUSTOMER-NAME TO XML- VALUE 
MOVE CUSTOMER-NAME- ID TO XML-TAG 
CALL "XML" USING XML -TAG XML -VALUE 

* Deferred write of remitter name 

10 MOVE HLS -REMITTANCE-NAME TO XML-VALUE 

MOVE REMITTER -NAME -ID TO XML -TAG 

CALL "XML- SET-NODE- VALUE" USING XML-TAG XML -VALUE 
WRITE BILL-RECORD FROM HL-BILL-HEADER- 11 

* Emit the customer address 

15 MOVE HLS -CUSTOMER- ADDRESS TO XML-VALUE 

MOVE CUSTOMER-ADDRESS-ID TO XML-TAG 
CALL "XML" USING XML-TAG XML-VALUE 

* Deferred write of remitter address 

MOVE HLS -REMITTANCE-ADDRESS TO XML- VALUE 
20 MOVE REMITTER-ADDRESS -ID TO XML-TAG 

CALL " XML - SET - NODE - VALUE " USING XML-TAG XML -VALUE 
WRITE BILL-RECORD FROM HL-BILL-HEADER- 12 

* Emit customer city-state- zip 

MOVE HLS-CT-ST-ZIP TO XML-VALUE 
25 MOVE CUSTOMER-CITY-STATE-ZIP-ID TO XML-TAG 

CALL "XML" XML -TAG XML -VALUE 

* Deferred write of remitter city-state-zip 

MOVE HLS-REMITTANCE-CT-ST-ZIP TO XML-VALUE : 
MOVE REMITTER-CITY-STATE- ZIP- ID TO XML-TAG 
30 CALL " XML - SET - NODE - VALUE " USING XML-TAG XML-VALUE 

WRITE BILL-RECORD FROM HL-BILL-HEADER- 13 

* Write of deferred remitter node with subnodes. 

MOVE XML"- REMITTER- ID TO XML-TAG 
CALL " XML - WRI TE - NODE " USING XML-TAG 

35 

The resulting output for this particular example can 
be seen below. 
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XML OUTPUT 

<bill-header> 
<customer> 

<name>John Doe</name> 
5 <address>lll Mizar Pl</address> 

<city-state-zip> Pasadena CA 93436-1204</city-state- 

zip> 

</customer> 
<remitter> 
10 <name>ABC WIRELESS</name> 

<address> P. O. BOX 666666</address> 

<city-state-zip>DALLAS TX 75263-llll</city-state-zip> 
</remitter> 
</bill-header> 

15 

An XML schema may impose cardinality constraints on 
the component elements. For example, in the schema below 
C, CI and C2 may each appear only once within their 
respective parents. It is important to ensure this 
20 property when producing an instance of this schema. 



<ElementType name="Cl" > 
<ElementType name="C2"> 
<ElementType name="C"> 
25 <element type="Cl" maxOccurs="l"/> 

<element type="C2" maxOccurs="l"/> 
</ElementType> 
<ElementType name="A"> 

<element type="C" maxOccurs="l"/> 
3 0 </ElementType> 

Some of the precomputed elements of the context 
table that represent the schema rooted at "A" are shown 
in the table below. 



35 
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ID Label Depth Parent Cardinality 

1 <A> 10 n 

2 <C> 2 1 1 
5 ' 3 <C1> 3 2 1 

4 <C2> 3 2 1 

The ID column holds the unique identifier associated with 

each element. The Cardinality column indicates a 
10 constraint on the number of occurrences of an element 

within its parent. 'n' means there may be zero or more. 

' 1 ' indicates that there should be exactly 1 . 

The table below shows how this information is used 

dynamically as XML- PRINT commands are executed. (Note 
15 that the COUNT column of the CONTEXT shows the change in 

the value of the cardinality count with respect to a 

particular schema element.) 



CONTEXT 



STATE 


STACK 


COUNT 


COMMAND 






OUTPUT 


0 


[] 


A =1 


XML -PRINT 


CI, 


VI 1 


<A> 


1 


CA] 


C =1 








<C> 


2 


[A,C] 


Cl=l 








<C1>V11</C1> 


3 


[A,C] 


C2 = l 


XML -PRINT 


C2, 


V21 


<C2>V21.</C2> 


4 


[A,C] 


C1=0 


XML- PRINT 


CI, 


VI 2 


</C> 






C2 = 0 










5 


[A] 


C =0 








</A> 


6 


[] 


A =2 








<A> 


7 


[A] 


C =1 








<C> 


8 


[A,C] 


Cl=l 








<C1>V12</C1> 



35 

The initial state, 0, includes an empty stack and no 
cardinality counts associated with any schema element. 
The command to print Vll as a schema element CI- causes a 
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check of "the state, the output of the <A> and <C> 
ancestor labels, and the output of the labeled Vll 
element. The STACK is modified to record the current 
context of an open <A> and <C> and the cardinality counts 
5 for A, C and CI are set to 1. 

The command to print V21 ' as a schema element C2 
causes a check of the state. The STACK as regards the 
ancestors of C2 is correct, so the only printing 
operation is the output of the labeled V21 element. The 
10 STACK is unchanged. The cardinality count for C2 is set 
to 1. 

The command to print V12 labeled by schema element 
CI causes a check of the state. The STACK in state 3 as 
regards the ancestors of CI is correct. However, the 

15 cardinality count for CI is equal to 1 which is the 
permitted cardinality of elements of this type. We 
therefore close C and reset the cardinality counts for 
its children, CI and C2 . At this point it can be seen 
that the cardinality count for C is equal to 1 which is 

2 0 the permitted cardinality of elements of this type. We 
therefore close A and reset the cardinality count for C. 
to 0. At this point (state 6) the stack is empty, and we 
output the ancestor labels <A> and <C>, output the 
labeled V12 element, modify the STACK to record the 

25 current context of an open <A> and <C> and set the 
cardinality counts for C and CI to 1 and A to 2 . 

Now, consider the case where the maximum occurrence 
of elements of type C has no upper bound. That is, the 
element definition of C within A is changed to: 



<element type="C" maxOccurs="n"/> 
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The third print step now becomes simpler, as shown in the 
table below: 

CONTEXT 



STATE 


STACK 


COUNT 


COMMAND 






OUTPUT 


0 
1 

10 2 


[] 

[A] 

[A,C] 


A =1 
C =1 
Cl = l 


XML -PRINT 


CI, 


VI 1 


<A> 
<C> 

<C1>V11</C1> 


3 


[A,C] 


C2 = l 


XML -PRINT 


C2, 


V22 


<C2>V22</C2> 


4 

15 

5 
6 


[A,C] 

[A] 
[A,C] 


C1=0 
C2=0 
C =2 
Cl = l 


XML- PRINT 


CI, 


V12. 


</C> 
<C> 

. .<C1>V12</C1> 



The first two XML- PRINT operations proceed as 
20 before. Because there may be an arbitrary number of C 

subelements of A there is no need to close the A and open 
a new one. We close C, setting the STACK to [A], and 
reset the cardinality counts for C's descendents, CI and 
C2. We open a new C and increment C's cardinality count 
25 to 2. Finally the labeled V12 element is output, and the 
cardinality count for CI is set to 1. 

Finally, contrast the previous -examples to the case 
where there is no upper bound on the occurrence of any 
element.- That is, the element definitions of C, CI and 
30 C2 are changed to: 

<element type="Cl" maxOccurs="n"/> 
<element type="C2" maxOccurs="n" /> 
<element type="C" maxOccurs="n"/> 

35 
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The state changes as seen in the table below: 
CONTEXT 

5 STATE STACK COUNT COMMAND OUTPUT 

1 [] A =1 XML- PRINT CI, Vll <A> 

2 [A] C =1 .- <C> 

3 [A, C] Cl=l <C1>V11</C1> 
10 

4 [A, C] C2=l XML- PRINT C2 , V22 <C2>V22</C2> 

5 [A, C] Cl=2 XML- PRINT CI, V12 <C1>V12</C1> 

15 The first and second calls work as before. The 

third call becomes even simpler. Because there may be an 
arbitrary number of CI subelements of C there is no need 
to close the C and open a new one. The labeled V12 
element is output, and the cardinality count for CI is 

20 incremented to 2 . 

When modifying legacy code certain difficulties 
arise in deciding when to print schema data that is 
contained in headers and footers. Consider the example 
of telephone invoices. The output of an invoicing 

25 program may consist of a sequence of invoices. Each 
invoice may take up a single page or multiple pages. 
When the invoice occupies multiple pages, its header is 
typically repeated. As a result, sometimes the header is 
introducing a new invoice schema element, and at other 

30 times it is mere page decoration of the human readable 
output. In order to recognize the need to close the 
current invoice tag and open a new one, it is necessary 
to know that there is some unique identifier associated 
with each invoice instance and that when the value of 

35 this 'key' changes, the current invoice is closed and a 
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new one opened. To enable this computation the context 
table contains a boolean identifier for key elements and 
the current values for these elements. This check is 
performed at the same time as the cardinality check. 
5 The present invention has a number of important 

business applications that relate to e-commerce and to 
more efficient use of legacy computer reports by brick- 
and-mortar businesses. One example is that internal 
reports otherwise printed on paper for manual inspection 

10 are instead available for storage on a database in XML 
format. Once electronically stored, the reports are 
available as electronic information assets for review by 
a browser or other electronic analysis. The reports are 
also much simpler to store in a data warehouse. 

15 Another commercial application is as Enterprise 

Application Integration (EAI) middleware for transfer of 
data between applications. Setting up transfer of data 
from structured databases, such as those using XML 
formats, is relatively straightforward since data 

20 definitions may be treated as semantic tags. In 

contrast, typical legacy computer system reports are 
unstructured since they represent data generated 
according to business logic instead of a data structure. 
By modifying underlying legacy applications to directly 

25 output XML formatted data, the outputted data is more 

easily treated as structured data files for integration 
in a suite of enterprise applications. 

Another commercial application is Electronic Bill 
Presentment and Payment (EBPP) . In order to provide 

30 electronic billing from typical legacy computer systems, 
a parser is generally used to parse untagged invoice data 
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files and then tag the data files with semantically 
meaningful identifiers. Parsers are expensive and' 
difficult to set up and maintain. In contrast, 
modification of underlying legacy computer system code to 
5 directly output XML formatted data saves time, requires 
less expertise and expense, and provides data in a 
recognized format for e-comraerce. Thus, businesses with 
legacy- computer systems may output XML formatted reports 
that allow the business to take advantage of advances 
10 taking place in e-commerce, such as automatic bill 

payment. For instance, individual telephone customers 
could receive their telephone bill by e-mail containing a 
web link to a site that provides the individual's bill 
detail. 

15 Another commercial application is archival of 

billing statements. Banks, for example, maintain large, 
archives of customer billing statements as reduced 
photographic copies on microfiche or as print streams on 
optical disk systems. Retrieval systems for these 

20 archives are complex and difficult to maintain.. Data 
extraction from the print streams is a recent 
improvement, as disclosed in U.S. Patent No. 6,031,625 
(US6031625) , but such a system still requires processing 
of print streams after they have been output from the 

25 legacy application. In contrast, modifying the 

underlying legacy computer code so it directly produces 
XML formatted billing statements makes archiving and 
retrieval of billing statements much simpler. For 
example, the XML statements can be stored in a relational 

3 0 database for easy retrieval. In addition, the retrieved 
statements, because they have an XML representation, 
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become directly viewable, for example, using browser 
technology. . 

Another commercial application is in business 
intelligence, which seeks to analyze electronic 
5 information assets to determine business behaviors, such 
as purchasing or selling behaviors. Syndicated data 
providers obtain data for intelligence analysis through 
reports that are parsed on a distributor or purchaser 
basis. This detailed parsing can be even more 

10 complicated than the parsing used to support EBPP 

function. Thus, direct generation of XML formatted data 
from a legacy computer system providing invoice reports 
is even more efficient in the business intelligence role 
than in electronic billing and. other applications since 

15 detailed data analysis is available without applying 
detailed parsing systems. 

Overall the direct generation of XML formatted data 
from a legacy computer system reduces friction in 
information networks by making the transfer of 

2 0 information simpler. This reduces the cost of tracking 
information, the manual effort to exchange and analyze 
business information, and reduces the time associated 
with obtaining valuable business intelligence from 
existing data sources. By making data available in 

25 semantically meaningful form, customers can automatically 
analyze their suppliers for Vendor Relationship 
Management, suppliers can automatically analyze their 
customers for Customer Relationship Management, and 
manufacturers can automatically analyze markets for their 

30 products for Market Intelligence. 
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Although the present invention has been described in 
detail, it should be understood that various changes, 
substitutions and alterations can be made hereto without 
departing from the spirit and scope of the invention as 
5 defined by the appended claims. 
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WHAT IS CLAIMED IS: 

1. A method for modeling a legacy computer system 
comprising: 

identifying incidents of applications of the 
5 legacy computer system that output data,- and 

defining a control flow graph of the output 

incidents. 

2. The method of Claim 1 further comprising: 
10 identifying the value or type of the data fields 

associated with each output incident; and 

attaching the value or type to the control flow 
graph . 

15 3. The method of Claim 2 wherein identifying the 

value or type further comprises : 

identifying output incidents of invariant data 
fields; and 

attaching the value of each invariant data field to 
2 0 its associated control flow graph incident. 

4. The method of Claim 2 wherein identifying the 
value or type further comprises: 

identifying output incidents of variant data fields; 

25 and 

attaching the type of each variant data field to its 
associated control flow graph incident . 



WO 01/67290 



PCT/US01/07239 



5 . The method of Claim 1 wherein the control flow 
graph comprises: 

plural nodes having. associated arcs, each node 
associated with an output incident. 

5 

6 . The method of Claim 5 wherein a complete 
control flow graph of the application (N, A) is used to 
compute a directed graph (N R , A R ) wherein: 

n comprises a node in N R if n, an element of N, 
10 starts an output process, stops an output process or 
outputs data; and 

<ni, n m > comprises an arc in A R if n x and n m are in N R 
and a sequence of arcs <ni, n 2 >, <n 2 , n 3 >, . . . ^rim-i, n m > 
is in A such that for i from 2 to m-1, m is not in N R . 

15 

7. The method of Claim 6 further comprising: 
defining the control flow graph as a formal grammar 

that describes the flow paths from each start command to 
the associated stop commands. 

20 

8. The method of Claim 1 further comprising: 
associating the incidents with an . Extensible Markup 

Language schema ; and 

creating a specification to modify the legacy 
25 computer system applications to provide output in 
Extensible Markup Language format. 



30 



9. The method of Claim 8 further comprising: 
automatically modifying the legacy computer system, 
applications in accordance with the specification. 
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10. A system for modeling an output application of 
a legacy computer system comprising: 

a modeling engine interfaced with the legacy 
computer system, the modeling engine operable to analyze 
5 an application loaded on the legacy computer system to 
identify incidents within the application that output 
data from the legacy computer system; and 

a control flow graph of the output operations within 
the applications. 

10 

11. The system of Claim 10 wherein the control flow 
graph comprises plural nodes, each node associated with 
an output incident. 

15 12 . The system of Claim 11 wherein a complete 

control flow graph of the application (N, A) is used to 
compute a directed graph (N R , A R ) wherein: 

n comprises a node in N R if n, an element of N, 
starts an output process, stops an output process or 
20 outputs data; and 

<ni, n m > comprises an arc in A R if n x and rio, are in N R 
and a sequence of arcs <ni, n 2 >, <n 2 , n 3 >, . . .,<nni-i, n ai > 
is in A such that for i from 2 to m-1, n* is not in N R . 

25 13. The system of claim 10 wherein the control flow 

graph of the output operations comprises as a formal 
grammar that describes the flow paths from each start 
command to the associated stop commands. 
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14. The. system of Claim 10 further comprising a 
graphical user interface in communication with the • 
modeling engine, the graphical user interface operable to 
display the control flow graph formal grammar and the 

5 incidents . 

15. The system of Claim 14 wherein the graphical 
user interface further communicates with a mapping engine 
and an Extensible Markup Language schema, the mapping 

10 engine operable to map the incidents of the applications 
with the control flow graph formal grammar and the 
Extensible Markup Language schema. 
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Dote 01/25/2000 American Telephone Company 

Monthly Statement for Account Number: 1111111111 

COMPANY: EOS 

PHONE NUMBER: (214) 999-1212 

PERSON: Suzie 0 Sample-Doto 

Oate Time Rate Number City and State Duration Cost 

12/01/1999 07:15A 8 210-404-6690 San Antoni TX 00:12.2 $1.22 

12/02/1999 01:00A A 210-404-6690 Son Antoni TX 00:01.0 $.05 

12/02/1999 17:45P 0 919-416-1212 Kill Devil NC 00:00.3 $.06 

12/03/1999 15:01P C 210-404-6690 San Antoni TX 00:02.0 $.30 

12/03/1999 20:23P D 919-462-1212 Kill Devil NC 03:02.3 $36.46 

12/04/1999 06:06A 8 615-655-1122 Nashville TN 00:00.5 $.05 

12/07/1999 04:00A A 210-404-6690 Son Antoni TX 01:00.0 $3.00 

12/07/1999 15:05P C 615-655-1122 Noshville TN 00:40.5 $6.07 

12/07/1999 15:45P C 205-555-1234 Dothon AL 00:04.3 $.64 

12/11/1999 02:13A A 615-655-1122 Noshville TN 00:30.0 $1.50 

12/11/1999 08:08A B 210-404-6690 San Antoni TX 02:20.5. $14.05 

12/13/1999 08:00A B 210-404-6690 Son Antoni TX 01:50.0 $11.00 

12/21/1999 00:26A A 210-404-6690 San Antoni TX 00:31.3 $1.56 

12/21/1999 04:12A A 919-416-1212 Kill Devil NC 00:32.0 $1.60 

12/21/1999 18:23P D 615-655-1122 Noshville- TN 00:23.3 $4.67 

12/21/1999 19:01P 0 210-404-6690 Son Antoni TX 03:02.4 $36.48 

12/22/1999 08:04A B 205-555-1234 Dothon ■ AL 00:43.2 $4.32 

12/27/1999 12:01A C 205-555-1234 Dothon AL 00:13.6 $2.04 

12/27/1999 • 21:1 2P 0 205-555-1234 Dothon AL 01:03.0 $12.60 

TOTAL CALLS: 19 TOTAL TIME: 16:12:49 $137.67 



FIG. 4 
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<bill> 

<occount>1 111111111 </occount> 

<customer-nome>EDS</customer-nome> 

<colling-date>1/billinq-dQte26/2000></> 

<totol-bill-by-phone>69.66</tota!-b;it-by-phone> 

<total-time-by-phone>515.6</toiol-time-by-phone> 

<total-calls>34</total-colls> 

<detoil-list> 

<detail-by-phone> 

<phone-number>999-1212</phone-number> 

<orea-code>214</areo-code> 

<phone-user>Suzie Q Sample-Data</phone-user> 

<colls-by-phone> 
<coll> 

<dote>12-01-1999</dote> 

<time>07.15A/time> 

<code>B</code> 

<phone-number>404-6690</phone-number> 

<area-code>210</orea-code> 

<colled -city>Son An !onio</ colled -city> 

<called-state>TX</colled-stote> 

<duration> 1 2.2</durotion> 

<chorqe>1.22</chorge> 

</coll> 

<call> 

<dote> 1 2-0 1 - 1 999</dote> 

<time>01.00A</time> 

<code>A</code> 

<phone-number>404-6690</phone-number> 
<area-code>2 1 0</orea-code> 
<called-city>San Antonio</called-city> 
<colled-stote>TX</colled-stote> 
<durotion>01.0</durotion> 
<chorge>0.05</chorge> 
</coll> 

<CQll> 

<date>12-01-1999</dote> 

<time>17.45P</time> 

<code>D</code> 

<phone-number>4 1 6- 1 2 1 2</ phone-number> 
<oreo-code>9 1 9</oreo-code> 
<colled-city>Kill Devil</called-city> 
<colled-stote>NC</colled-stote> 
<duralion>0.3</duration> 
<chorge>0.06</charge> 
</coll> 
</colls-by-phone> 

FIG. 5 

</detoil-by-phone> 

</detail-list> 
</bill> 
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<Schemo xmlns="urn: schemos-microsoft-comrxml-doto" 
xmlns:dt="urn:schemos-microsoft-com:datotypes"> 
<ElementType name="occountno" content="textonly"/> 
<ElementType name="customer-nome" content="textonly"/> 
<ElementType nome="billing-dote" content="textonly"/> 
<ElementType nome="totol-cost" content =''textonly ,, /> . .. 
<ElementType name="total-number-colls" content="textonly"/> 
<ElementType nome="total-time" content="textonly"/> 
<ElementType name="phone-number" content="textonly"/> 
<ElementType nome="phone-user" content="textonly"/> 
<E!ementType name="date" content="textonly"/> 
<ElementType name="charge" content="textonly"/> 
<ElementType name="called-city" content="textonly"/> 
- <ElementType name="called-stote" content="textonly"/> 
<ElementType name="duration" content="textonly"/> 
<ElementType nome="code" content="textonly"/> 
<ElementType name="oreo-code" content="textonly"/> 
<ElementType nome="time" content="textonly"/> 
<ElementType nome="call"> 
<element type="orea-code" mox0ccurs="1" /> 
<element type="phone-number" mox0ccurs="1" /> 
<element type="code" maxOccurs="1" /> 
<element type="date" mQx0ccurs="1" /> 
<elemenl type="time" max0ccurs="1" /> 
<element type="duration" maxOccurs="l" /> 
<e!emenl type="chorge" max0ccurs="1" /> 
<e!ement type="called-city" max0ccurs="1" /> 
<element type="called-state" moxOccurs="1" /> 
</ElementType> 

<ElementType nome="totol-bill-by-phone" content="textonly"/> 
<ElementType name="totol-time-by-phone" content="textonly"/> 
<ElementType name="total-colls-by-phone" content="lextonly"/> 
<ElementType nome="colls-by-phone"> 

<ElementType name="caH"/> 
</ElementType> 

<ElementType name="detail-by-phone"> 

<element type="phone-number"/> 

<element type="area-code"/> 

<element type="phone-user"/> 

<element type="total-bill-by-phone"/> 

<dement type="totol-time-by-phone"/> 

<element type="total-calls-by-phone"/> 

<element type="calls-by-phone"/> 
</ElementType> 

<ElementType nome="detail-list"> 

<element type="detoil-by-phone"/> 
</ElementType> 
<ElementType name="bill"> 

<element type="occountno"/> 

<element type="customer-nome"/> 

<element type="billing-date"/> 

<element type="total-cost"/> 

<element type= "total-number-calls"/> 

<element type="totol-time"/> 

<element type="detoil-list'7> _ 
</ElementType> tlLr. DA 

</Schema> 
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^.Source File Viewer QBI1 H1 



| D:\xml-annototion\XML Annofaior\ Sample Dofg\cobol\phonebill.cbl |J 



IB 



56900 * " ~ " ~~ P| 

57000 /******************************************************************* 

57100 * . SORT * 

57200 ******************************************************************** 

57300 *BUIL0-SORT-TEMP-F!LE. 

57400 * BUILD TEMPFILE HERE. 

57500 * 

57600 /******************************************************************* 

57700 * TERMINATION * 

57800 ******************************************************************** 

57900 TERMINATION. 

58000 PERFORM PR1NT-A-CONTROL-BREAK-2. 94188JHA 

58100 MOVE RECORDS-SELECTED-TOTAL TO RECORDS-SELECTED-EDIT. 94190JHA 

58200 ADO 1 TO PAGE -COUNT. 

58300 MOVE PAGE-COUNT . TO REPORT-PAGE-NO. 

58400 WRITE PRTREC FROM R1 -TITLE AFTER ADVANCING PAGE. 00003JHA 

58500 MOVE "GRAND TOTAL:" TO RS-MSG. 

58600 MOVE GT-DURATION TO HOLD-DURATION. 

58700 PERFORM ACCUM-DURATION. 

58800 MOVE WS-HH TO RS-HH 

58900 MOVE WS-MM .TO RS-MM. 

59000 MOVE WS-SS TO RS-SS. 

59100 MOVE GT-COST TO RS-COST. 

59200 WRITE PRTREC FROM RS-L1NE AFTER- ADVANCING 5 LINES. 

59300 CLOSE TEMPFILE. 

59400 CLOSE PRINT. 

59500 * 

59600 /******************************************************************* 94 1 90JHA 

59700 * DURATION COMPUTATION AREA *94190JHA E 
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<Schema xmlns="urn:schemas-microsoft-com:xml-data" 

xmlns:dt="urn:schemas-microsoft-com:dQtotypes"> 
<AttributeType name="Current" content="booleon"/> 
<ElementType nome='Name" content="textonly"/> 
<ElementType nome="ID" content="textonly"/> 
<ElementType name="Street" content="textonly"/> 
<ElementType name="City" content="textonly"/> 
<ElementType name="State" content="textonly"/> 
<ElementType nome="ZIP" content="textonly"/> 
<ElementType name="Address"> 

<Attribute type="Current"/> 

<E!emertt type="Street"/> 

<Element type="City"/> 

<Element type="State"/> 

<Element type="ZlP"/> 
</ElementType> 

<ElementType name="Customer"> 

<Element type="Nome"/> 

<E|ement type="Address"/> 

<Element type="ID"/> 
</ElementType> 

</Schema> PTC 7 
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